• Home
  • Introduction
  • Data Source
  • Data Visualization
  • Exploratory Data Analysis
  • ARMA/ARIMA/SARIMA Model
  • ARIMAX Model
  • Financial Time Series Model
  • Deep Learning for TS
  • Conclusion

ARIMAX/SARIMAX Model for US stock indices and macroeconomic factors as exogenous variables

In this case, we are interested in predicting the performance of three stock indices, such as the S&P 500, NASDAQ, and Dow Jones Industrial Average. We will use macroeconomic factors as exogenous variables to improve the accuracy of our predictions. Some examples of macroeconomic factors we might consider include GDP growth, inflation rates, and interest rates.

Including these exogenous variables can help us to better understand how the stock market might be impacted by changes in the broader economy. For example, if GDP growth is predicted to increase, we might expect to see a corresponding increase in the stock market indices as well.

According to the findings, the endogenous and exogenous variables in the time series data are not interdependent, then the ARIMAX model can be a good choice for predicting the stock market indices. If there is seasonality in the data, then the SARIMAX model can be used to account for this seasonal variation. If there is no seasonality, then the simpler ARIMA model can be used instead of SARIMAX.

Let’s examine the relationship between endogenous and exogenous variables before proceeding with the ARIMAX/SARIMAX model.

  • Plot
  • Normalized Plot
Code
ts_plot(index_factor_data,
        title = "Stock Prices and Macroeconomic Variables",
        Ytitle = "Values",
        Xtitle = "Year")
Code
numeric_vars_index_factor_data <- c("DJI.Adjusted", "IXIC.Adjusted", "GSPC.Adjusted", "gdp", "interest", "inflation", "unemployment")
numeric_index_factor_data <- index_factor_data[, numeric_vars_index_factor_data]
normalized_index_factor_data_numeric <- scale(numeric_index_factor_data)
normalized_index_factor_data <- ts(normalized_index_factor_data_numeric, start = c(2010, 1), frequency = 4)
ts_plot(normalized_index_factor_data,
        title = "Normalized Time Series Data for Stock Prices and Macroeconomic Variables",
        Ytitle = "Normalized Values",
        Xtitle = "Year")

The Stock Prices and Macroeconomic Variables plot, displays the time series data of various stock prices and macroeconomic variables from 2010 to 2022. Since the variables in the time series data have different scales or units, it can make the plot difficult to interpret, as the differences between variables may be obscured by the varying magnitudes. Normalizing the data by scaling it to a common scale, such as z-scores or percentage changes, can help to eliminate this issue and provide a clearer view of the relationships and patterns in the data.

The Normalized Time Series Data for Stock Prices and Macroeconomic Variables plot, shows the same variables as the first plot, but the data has been normalized. Normalization is the process of scaling data to a common range, usually between 0 and 1, to eliminate the impact of different scales or units of measurement. In this case, the data has been scaled using the scale() function in R, which standardizes the variables to have a mean of 0 and a standard deviation of 1.

Normalizing the time series data is beneficial for several reasons. First, it helps to remove any bias or distortion that may be introduced by variables with different units or magnitudes, allowing for a fair comparison between variables. Second, normalizing the data can help to stabilize the VAR model estimation, as variables with large values or extreme fluctuations may disproportionately influence the results. Lastly, normalizing the data can also improve the interpretability of the model coefficients, as the coefficients will be in the same scale and can be directly compared to assess their relative importance.

Cross-Correlation for the Variables and Selection of Feature Variables

Cross-correlation is a statistical technique used to measure the relationship between two or more variables in a time series. In the context of ARIMAX modeling, cross-correlation is often used for feature selection. For selecting feature variables in our analysis, we will first examine the correlation through a heatmap among all the variables, and then analyze the autocorrelation function (ACF) plots between the response variable and the exogenous variables.

Correlation Heatmap
Code
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)]<- NA
    return(cormat)
}
cormat <- round(cor(normalized_index_factor_data_numeric),2)

upper_tri <- get_upper_tri(cormat)

melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
    name="Pearson\nCorrelation") +
  theme_minimal()+ # minimal theme
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 12, hjust = 1))+
 coord_fixed()

ggheatmap + 
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  panel.grid.major = element_blank(),
  panel.border = element_blank(),
  panel.background = element_blank(),
  axis.ticks = element_blank(),
  legend.justification = c(1, 0),
  legend.position = c(0.6, 0.7),
  legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
                title.position = "top", title.hjust = 0.5))

The heatmap reveal important insights into the relationships between the stock market indices and various economic indicators. The strong positive correlations between the stock market indices and inflation, along with the negative correlations with unemployment rate, suggest that these variables may play a significant role in influencing stock market movements. In contrast, the weaker correlations between the stock market indices and GDP and interest rates indicate that these variables may have less impact on stock market fluctuations. These findings provide valuable guidance for selecting relevant variables in the VAR model to better understand and forecast stock market dynamics.

Click to view ARIMAX/SARIMAX Model for Dow Jones index and macroeconomic factors as exogenous variables

Click to view ARIMAX/SARIMAX Model for NASDAQ Composite index and macroeconomic factors as exogenous variables

Click to view ARIMAX/SARIMAX Model for S&P 500 index and macroeconomic factors as exogenous variables